pacman::p_load(sf, sfdep, tmap, tidyverse, knitr, dplyr, readr)Take home Exercise 1: Geospatial Analytics for Public Good
1 Overview
With the recent trend of massive deployment of pervasive computing technologies such as GPS and RFID on the vehicles, city-wide urban infrastructures such as buses, taxis, mass rapid transit, public utilities and roads become digital. The datasets obtained are likely to contain structure and patterns that provide useful information about characteristics of the measured phenomena. These will potentially contribute to a better urban management and useful information for urban transport services providers both from the private and public sector to formulate informed decision to gain competitive advantage.
2 Objective
However, in real-world practice, the use of these data tend to be confined to simple tracking and mapping with GPS applications. Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. In this exercise, we will apply appropriate Local Indicators of Spatial Association (GLISA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.
3 Getting Started
3.1 Loading R packages
The code chunk below installs and loads sf, sfdep, tmap, tidyverse, knitr, dplyr, mapview, readr packages into R environment.
3.2 The data
In this study, three datasets will be used:
Aspatial data: Passenger Volume by Origin Destination Bus Stops downloaded from LTA DataMall.
Geospatial data:
Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.
hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.
3.3 Importing the aspatial data
Firstly, we will import the Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall by using read_csv() of readr package. Since there are three months of data, we choose only one of them to perform our analysis.
odbus <- read_csv("data/aspatial/origin_destination_bus_202308.csv")Then, we use glimpse() to check of odbus tibble data frame.
glimpse(odbus)Rows: 5,709,512
Columns: 7
$ YEAR_MONTH <chr> "2023-08", "2023-08", "2023-08", "2023-08", "2023-…
$ DAY_TYPE <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 17, 7, 17, 14, 10, 10,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <chr> "04168", "04168", "80119", "80119", "44069", "4406…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "1722…
$ TOTAL_TRIPS <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …
We can see that column ORIGIN_PT_CODE and DESTINATION_PT_CODE are in chr data type. For further processing, we should convert these data values into factor data type.
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) Notice that both of them are in factor data type now.
glimpse(odbus)Rows: 5,709,512
Columns: 7
$ YEAR_MONTH <chr> "2023-08", "2023-08", "2023-08", "2023-08", "2023-…
$ DAY_TYPE <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 17, 7, 17, 14, 10, 10,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <fct> 04168, 04168, 80119, 80119, 44069, 44069, 20281, 2…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 17229, 20141, 2…
$ TOTAL_TRIPS <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …
3.4 Importing the geospatial data
According to epsg.io, Singapore’s coordinate system is SVY21 with EPSG 3414. So we needto re-assign EPSG to 3414 by using st_transform().
busstop <- st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`C:\Stella12121\ISSS624\Take-home_Ex1\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
The structure of busstop sf tibble data frame looks as below.
glimpse(busstop)Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…
Rename the column ‘BUS_STOP_N’ to ‘ORIGIN_PT_CODE’ for easy join with odbus dataset.
busstop <- busstop %>% rename(ORIGIN_PT_CODE = BUS_STOP_N)4 Data wrangling
4.1 Extracting the study data
For the purpose of this exercise, we will extract commuting flows during the weekday morning peak, weekday afternoon peak, weekend/holiday morning peak and evening peak with the reference to the time intervals provided in the table below.

Then, we use filter() function to extract the data according to the required period.
origin6_9 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 &
TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))origin17_20 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 &
TIME_PER_HOUR <= 20) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))origin11_14 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 &
TIME_PER_HOUR <= 14) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))origin16_19 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 16 &
TIME_PER_HOUR <= 19) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))4.2 Combining busstop and hexagon layer
First, using st_make_grid() to create the hexagon layer.
Show the code
area_honeycomb_grid = st_make_grid(busstop, cellsize = 500, what = "polygons", square = FALSE)
# To sf and add grid ID
honeycomb_grid_sf = st_sf(area_honeycomb_grid) %>%
# add grid ID
mutate(grid_id = 1:length(lengths(area_honeycomb_grid)))For hexagonal cells, cellsize is defined as the distance between opposite edges. In this case, we should use 250*2 = 500
Second, we populate the planning grid_id of honeycomb_grid_sf sf data frame into busstop sf data frame.
busstop_grid <- st_intersection(busstop, honeycomb_grid_sf) %>%
select(ORIGIN_PT_CODE, grid_id) %>%
st_drop_geometry()st_intersection()is used to perform point and polygon overly and the output will be in point sf object.select()of dplyr package is then use to retain only ORIGIN_PT_CODE and grid_id in the busstop_grid sf data frame.five bus stops are excluded in the resultant data frame because they are outside of Singapore bpundary.
Third,we are going to append the grid_id from busstop_grid data frame onto origin6_9, origin17_20, origin11_14 and origin16_19 data frame seperately.
origin_WM <- left_join(origin6_9 , busstop_grid,
by = "ORIGIN_PT_CODE" ) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_SZ = grid_id) %>%
group_by(ORIGIN_SZ, ORIGIN_BS) %>%
summarise(TOT_TRIPS = sum(TRIPS))origin_WA <- left_join(origin17_20 , busstop_grid,
by = "ORIGIN_PT_CODE" ) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_SZ = grid_id) %>%
group_by(ORIGIN_SZ, ORIGIN_BS) %>%
summarise(TOT_TRIPS = sum(TRIPS))origin_WHM <- left_join(origin11_14 , busstop_grid,
by = "ORIGIN_PT_CODE" ) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_SZ = grid_id) %>%
group_by(ORIGIN_SZ, ORIGIN_BS) %>%
summarise(TOT_TRIPS = sum(TRIPS))origin_WHE <- left_join(origin16_19 , busstop_grid,
by = "ORIGIN_PT_CODE" ) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_SZ = grid_id) %>%
group_by(ORIGIN_SZ, ORIGIN_BS) %>%
summarise(TOT_TRIPS = sum(TRIPS))Before continue, it is a good to check for duplicating records.
duplicate_1 <- origin_WM %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate_2 <- origin_WA %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate_3 <- origin_WHM %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate_4 <- origin_WHE %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()If duplicated records are found, the code chunk below will be used to retain the unique records.
origin_WM <- unique(origin_WM)
origin_WA <- unique(origin_WA)
origin_WHM <- unique(origin_WHM)
origin_WHE <- unique(origin_WHE)Lastly,
origintrip_hexagon_WM <- left_join(honeycomb_grid_sf,
origin_WM,
by = c("grid_id" = "ORIGIN_SZ"))
origintrip_hexagon_WA <- left_join(honeycomb_grid_sf,
origin_WA,
by = c("grid_id" = "ORIGIN_SZ"))
origintrip_hexagon_WHM <- left_join(honeycomb_grid_sf,
origin_WHM,
by = c("grid_id" = "ORIGIN_SZ"))
origintrip_hexagon_WHE <- left_join(honeycomb_grid_sf,
origin_WHE,
by = c("grid_id" = "ORIGIN_SZ"))Using filter() to drop the NA in dataset.
origintrip_hexagon_WM = filter(origintrip_hexagon_WM, TOT_TRIPS > 0)
origintrip_hexagon_WA = filter(origintrip_hexagon_WA, TOT_TRIPS > 0)
origintrip_hexagon_WHM = filter(origintrip_hexagon_WHM, TOT_TRIPS > 0)
origintrip_hexagon_WHE = filter(origintrip_hexagon_WHE, TOT_TRIPS > 0)5 Geographical distribution of the passenger trips
In this section, we will plot the grid into an interactive thematic map with tmap.
5.1 weekday morning peak
Show the code
tmap_mode("view")
tmap_options(check.and.fix = TRUE)
tm_shape(origintrip_hexagon_WM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Blues",
title = "Total_trips of weekday morning trip") +
tm_layout(main.title = "Passenger trips generated at hexagon level",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2) 5.2 weekday afternoon peak
Show the code
tmap_mode("view")
tmap_options(check.and.fix = TRUE)
tm_shape(origintrip_hexagon_WA)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Blues",
title = "Total_trips of weekday afternoon trip") +
tm_layout(main.title = "Passenger trips generated at hexagon level",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2) 5.3 weekend/holiday morning peak
Show the code
tmap_mode("view")
tmap_options(check.and.fix = TRUE)
tm_shape(origintrip_hexagon_WHM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Blues",
title = "Total_trips of weekend/holiday morning peak") +
tm_layout(main.title = "Passenger trips generated at hexagon level",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2) 5.4 weekend/holiday evening peak
Show the code
tmap_mode("view")
tmap_options(check.and.fix = TRUE)
tm_shape(origintrip_hexagon_WHE)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Blues",
title = "Total_trips of weekend/holiday evening peak") +
tm_layout(main.title = "Passenger trips generated at hexagon level",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2)